Skip to main content

Fitted Model Measurement

To measure the fit of a model, we need to compare the model's prediction with the actual data. The most common way to do this is to use the mean squared error (MSE) or the root mean squared error (RMSE). Depend on the different type of data, we can use different measurement to fit.

Quantitative

Mean squared error(MSE) with equation E[(Yf^(X))2]\mathbb{E}[(Y - \hat f(X))^2] is the way we find our function closer f^\hat f to the true ff, that is, MSE(f^):=i=1N(yif^(xi))2=E[(Yf^(X))2]MSE(\hat f) := \sum_{i = 1}^N (y_i - \hat f(x_i))^2 = \mathbb{E}[(Y - \hat f(X))^2]

  • we call it's a training MSE as it uses DtrainD_{train}
  • the one f^\hat f we want is the suitable arg mingMSE(g)\argmin\limits_{g}MSE(g) through DtrainD_{train} where too small sample size will cause overfitting.

After we get such f^\hat f, we can use DtestD_{test} to look at test MSE, where MSET(f^)=1mi=1m(yTif^(xTi))2MSE_T(\hat f) = \frac{1}{m}\sum_{i = 1}^m(y_{Ti} - \hat f(x_{Ti}))^2

  • If we don't have such DtestD_{test} we can use resampling technique which called cross-validation
  • And the final model is the f^\hat f which make the smallest test MSEMSE, that is, the f^\hat f make smallest test MSEMSE and MSE(f^)MSE(\hat f) is what we want.

Notice, there exists some Bias-Variantrade-off among the MSE. For the given xx, we have the conditional expected MSEMSE :

E[(Yf^(X))2X=x]=Var(f^(x))+(E[f^(x)]f(x))2+Var(epsilon)Var(epsilon)\mathbb{E}[(Y - \hat f(X))^2 | X = x] \\ = Var(\hat f(x)) + (\mathbb{E}[\hat f(x)] - f(x))^2 + Var(epsilon) \\ \ge Var(epsilon)

  • where Var(f^(x))Var(\hat f(x)) is the variance, which describe how much f^\hat f would change if we estimated it using a different training data set.
  • (E[f^(x)]f(x))2(\mathbb{E}[\hat f(x)] - f(x))^2 is the bias which refer to the error that is introduced by parametrizing ff
  • Var(ϵ)Var(\epsilon) is Irreducible error.
  • The ideal f^\hat f should also minimize this MSE

From the conditional expected MSEMSE we have conclusion where:

  • more complexity on parameters of f^\hat f increases the variance of f^\hat f whereas its bias decrease
  • and, less complexity lead a small variance of f^\hat f but larger bias.
  • When two f^\hat f based on different perspective have similar expected MSEs, we prefer the less complex one.

Training MSE always decrease as we increase the complexity of f^\hat f.

  • The model based on training MSE may lead to overfitting, that is, select model based on test MSE.

We also can use another way to get f^\hat f by Sum of Absolute Difference (SAM): E[Yf^(X)]=1ni=1nyif^(xi)\mathbb{E}[|Y - \hat f(X)|] =\frac{1}{n}\sum_{i=1}^n |y_i - \hat f(x_i)|.

  • Notice, both MSE && SAM are for quantitative YY

Categorical/Ordinal

We use expected error rate for Categorical/Ordinal, which is defined as E[1{Yf^(X)}]\mathbb{E}[1\{Y\ne \hat f(X)\}]

  • Training error rate is 1ni=1n1{yif^(xi)}\frac{1}{n}\sum_{i=1}^n 1\{y_i\ne \hat f(x_i)\}
  • Test error rate is 1mi=1m1{yTif^(xTi)}\frac{1}{m}\sum_{i=1}^m 1\{y_{Ti}\ne \hat f(x_{Ti})\}